nlp crash course
Word Embeddings: An NLP Crash Course
The field of natural language processing (NLP) makes it possible to understand patterns in large amounts of language data, from online reviews to audio recordings. But before a data scientist can really dig into an NLP problem, he or she must lay the groundwork that helps a model make sense of the different units of language it will encounter. Word embeddings are a set of feature engineering techniques widely used in predictive NLP modeling, particularly in deep learning applications. Word embeddings transform sparse vector representations of words into a dense, continuous vector space, enabling you to identify similarities between words and phrases -- on a large scale -- based on their context. In this piece, I'll explain the reasoning behind word embeddings and demostrate how to use these techniques to create clusters of similar words using data from 500,000 Amazon reviews of food. You can download the dataset to follow along.